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ABSTRACT 

Recent  studies  have  noted  that  vertex  degree  in  the  au¬ 
tonomous  system  (AS)  graph  exhibits  a  highly  variable  dis¬ 
tribution  [15,  22],  The  most  prominent  explanatory  model  for 
this  phenomenon  is  the  Barabasi-Albert  (B-A)  model  [5,  2], 
A  central  feature  of  the  B-A  model  is  preferential  connectivity 
—  meaning  that  the  likelihood  a  new  node  in  a  growing  graph 
will  connect  to  an  existing  node  is  proportional  to  the  existing 
node's  degree.  In  this  paper  we  ask  whether  a  more  general 
explanation  than  the  B-A  model,  and  absent  the  assumption 
of  preferential  connectivity,  is  consistent  with  empirical  data. 
We  are  motivated  by  two  observations:  first,  AS  degree  and 
AS  size  are  highly  correlated  [11];  and  second,  highly  vari¬ 
able  AS  size  can  arise  simply  through  exponential  growth.  We 
construct  a  model  incorporating  exponential  growth  in  the  size 
of  the  Internet,  and  in  the  number  of  ASes.  We  then  show  via 
analysis  that  such  a  model  yields  a  size  distribution  exhibiting 
a  power-law  tail.  In  such  a  model,  if  an  AS’s  link  formation  is 
roughly  proportional  to  its  size,  then  AS  degree  will  also  show 
high  variability.  We  instantiate  such  a  model  with  empirically 
derived  estimates  of  growth  rates  and  show  that  the  resulting 
degree  distribution  is  in  good  agreement  with  that  of  real  AS 
graphs. 
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1.  INTRODUCTION 

Many  aspects  of  the  Internet’s  structure  are  relatively  unknown. 
These  gaps  in  our  knowledge  pose  problems  when  attempting 
to  construct  representative  network  topologies  for  simulation 
and  modeling.  In  addition,  filling  these  gaps  may  shed  light  on 
the  forces  behind  the  Internet’s  growth  and  the  ways  in  which 
the  network  may  fail. 

One  aspect  of  the  Internet’s  structure  that  has  drawn  great 
interest  is  the  autonomous  system  (AS)  graph  (the  graph  in 
which  vertices  represent  ASes  and  edges  represent  AS-AS  peer¬ 
ing  relationships).  A  particularly  surprising  aspect  of  these 
graphs  is  that  vertex  degree  generally  possesses  a  highly  vari¬ 
able  distribution  [15,  22]. 

In  discussing  properties  of  the  AS  graph,  it  is  useful  to  draw  a 
distinction  between  high  variability  and  power-law  tails.  High 
variability  is  a  qualitative  notion,  referring  to  a  probability  dis¬ 
tribution  showing  non-negligible  values  over  a  wide  range  of 
scales  (typically  at  least  three  orders  of  magnitude).  On  the 
other  hand,  a  distribution  p(-)  with  power-law  tails  has  the  for¬ 
mal  property  that: 

p{x)  ~  x~a 

with  a  >  0,  and  where  a(x)  ~  b(x)  means  that  lim^-Kx, 
a(x)/b(x)  =  c. 

Some  authors  have  argued  that  AS  vertex  degree  is  well  mod¬ 
eled  as  having  power-law  tails  [15,  22],  Others  have  sug¬ 
gested  that  vertex  degree  does  not  clearly  exhibit  power-law 
tails,  although  it  is  highly  variable  [9],  Since  such  highly- 
variable  distributions  do  not  arise  in  simple  random  graphs, 
and  since  power-law  tails  do  provide  a  simple  (albeit  crude) 
approximation  for  the  behavior  of  the  true  distribution,  a  num¬ 
ber  of  papers  have  proposed  mechanisms  (more  complicated 
than  purely  random  connection)  that  may  give  rise  to  power- 
law  degree  distributions  in  graphs  [5,  20,  19]. 

The  most  prominent  model  attempting  to  explain  the  emer¬ 
gence  of  power-law  degree  distributions  is  the  Barabasi-Albert 
model  (or  B-A  model)  [5,  2],  In  fact,  it  has  been  considered  in 
a  number  of  papers  as  a  model  for  AS  graphs  [3,  7,  27,  24,  32], 
The  B-A  model  assumes  the  network  is  formed  through  incre- 
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mental  addition  of  nodes.  In  the  simplest  form  of  the  model,  a 
new  node  forms  a  connection  to  an  existing  node  with  proba¬ 
bility  proportional  to  the  existing  node’s  degree.  This  prefer¬ 
ential  connectivity  leads  to  a  “rich  get  richer”  phenomenon  in 
which  high  degree  nodes  tend  to  increase  in  degree  faster  than 
low  degree  nodes. 

In  this  paper  we  examine  whether  explanations  more  general 
than  the  B-A  model  may  suffice  to  explain  highly  variable  de¬ 
gree  distributions  in  the  AS  graph.  We  are  motivated  by  two 
observations.  First,  the  authors  in  [11]  point  out  that  AS  de¬ 
gree  is  strongly  correlated  with  AS  size  (measured  in  number 
of  nodes)  —  and  that  AS  size  also  shows  a  highly  variable 
distribution.  Second,  we  observe  that  during  the  last  10  years 
or  so,  the  Internet  has  undergone  exponential  growth  in  both 
number  of  nodes  and  number  of  ASes.  Under  such  conditions, 
we  show  here  that  highly  variable  AS  sizes  (and,  presumably 
as  a  consequence,  highly  variable  AS  degrees)  may  readily 
arise  due  to  exponential  growth  alone. 

We  explore  these  observations  in  this  paper  by  constructing  a 
simple  growth  model  for  AS  graphs.  Our  model  makes  three 
assumptions:  (1)  exponential  growth  in  the  number  of  hosts  in 
the  network;  (2)  exponential  growth  in  the  number  of  ASes  in 
the  network;  and  (3)  an  approximately  proportional  relation¬ 
ship  between  AS  size  and  degree.  The  resulting  model  shows 
that  highly  variable  AS  degrees  may  easily  arise  without  pref¬ 
erential  connectivity,  and  in  fact  without  any  global  knowledge 
of  network  state  by  individual  ASes.  Indeed,  in  our  model,  the 
methods  by  which  ASes  select  peering  partners  can  remain 
completely  unspecified. 

In  our  model,  M  (the  total  number  of  hosts)  and  N  (the  to¬ 
tal  number  of  ASes)  are  described  by  the  simple  linear  growth 
equations  dN/dt  =  qN  and  dM/dt  =  pM  +  qN,  where  q 
and  p  are  the  growth  parameters.  We  show  that  in  the  asymp¬ 
totic  time  limit,  this  model  leads  to  a  stationary  size  distribu¬ 
tion  with  power-law  tails.  We  then  show  that  if  these  growth 
rules  are  used  to  construct  a  graph,  such  that  as  each  AS  grows 
it  forms  links  to  other  ASes  in  approximate  proportion  to  its 
own  size,  then  the  resulting  degree  distribution  also  shows 
high  variability. 

We  validate  the  degree  distributions  produced  by  this  simple 
model  using  empirical  measurements  of  AS  degree  distribu¬ 
tions.  For  this  purpose  we  use  measurements  from  BGP  tables 
stored  at  Routeviews  [28],  as  well  as  overlay  maps  produced 
by  mapping  routers  from  the  Mercator  [17]  and  Skitter  [30] 
datasets  to  their  corresponding  ASes.  We  find  that  the  result¬ 
ing  degree  distributions  in  our  simulated  graphs  are  in  good 
agreement  with  empirical  data. 

We  conclude  that,  for  topology  generation,  it  is  not  necessary 
to  incorporate  preferential  connectivity  in  order  to  generate 
highly  variable  AS  degree  distributions.  This  leaves  the  door 
open  for  more  practically  justified  bases  for  forming  inter- AS 
links,  e.g.,  based  on  economic  and  geographical  considera¬ 
tions. 


In  summary,  in  this  paper  we  explore  a  model  for  the  AS  graph 
that  is  more  general  than  the  B-A  model,  and  is  based  on  em¬ 
pirical  observations  of  Internet  growth  dynamics.  It  allows  for 
inter- AS  connections  to  be  formed  in  a  way  that  need  not  be 
based  on  AS  degree.  We  show  that  it  yields  highly-variable 
degree  distributions,  and  that  its  outputs  agree  well  with  em¬ 
pirical  measurements  of  AS  graph  degree  distribution. 

2.  RELATED  WORK 

Until  recently,  Internet  topologies  have  been  generated  using 
random  and  hierarchical  models.  Among  the  more  significant 
of  these  is  work  due  to  Calvert  et  al.  [8].  That  paper  proposes 
generating  smaller  domain-like  networks  and  connecting  them 
together  to  create  a  hierarchical  structure  whose  properties  are 
specified  by  input  parameters.  Unfortunately,  these  random 
and  hierarchical  approaches  fail  to  capture  many  significant 
attributes  of  Internet  topology  as  well  as  the  power-law  models 
[32,  24]  discussed  below. 

Since  attention  was  drawn  to  power-laws  in  Internet  topologies 
by  [15],  modeling  efforts  have  shifted  to  reproducing  these 
power-law  properties.  The  most  notable  effort  in  this  direction 
has  been  the  Barabasi-Albert  preferential  attachment  model 
[5],  This  model  was  first  formulated  and  solved  by  Simon 
[29]  and  further  developed  by  Price  [12,  13].  In  this  model, 
the  network  is  formed  through  incremental  addition  of  nodes. 
The  model’s  key  assumption  is  that  a  new  node  forms  a  con¬ 
nections  to  an  existing  node  based  the  existing  node’s  degree. 
The  probability  that  a  new  node  will  connect  to  an  existing 
node  i  is  proportional  to  II (i)  =  kifEjkj,  where  ki  is  the  de¬ 
gree  of  node  i.  The  resulting  rate  at  which  nodes  acquire  new 
edges  is  given  by  Ski/St  =  ki/2t,  where  t  is  the  time  elapsed 
from  the  start  of  the  process.  The  resulting  degree  distribution 
exhibits  a  power-law  tail,  with  a  fixed  exponent  of  a  =  3. 

Later  work  has  built  upon  and  extended  the  B-A  model.  The 
same  authors  in  [3]  extended  the  model  to  allow  re-wiring,  in 
which  edges  may  also  be  deleted  or  moved  at  each  timestep; 
this  allows  the  exponent  to  vary.  The  work  in  [27]  investi¬ 
gates  the  case  where  only  a  subset  of  all  nodes  in  the  network 
are  available  for  connection.  With  only  slight  modifications 
to  the  B-A  model  they  show  that  a  power-law  degree  distri¬ 
bution  emerges.  Additionally,  a  “generalized  linear  prefer¬ 
ence”  model  is  proposed  in  [7]  that  better  matches  the  cluster¬ 
ing  behavior  and  path  lengths  of  empirical  Internet  measure¬ 
ments.  These  extensions  have  improved  the  flexibility  of  the 
B-A  model,  albeit  with  a  corresponding  increase  in  complex¬ 
ity. 

The  generation  of  power-laws  through  random  graph  models 
has  also  received  considerable  recent  attention.  An  overview 
of  existing  models  appears  in  [1],  along  with  a  method  which 
generalizes  all  of  them;  this  family  of  models  is  analyzed  in 
[21],  In  these  models,  nodes  are  periodically  added  to  the 
graph  with  some  probability  and  are  initially  assigned  an  in¬ 
weight  and  out- weight  of  1.  At  each  timestep,  t,  with  some 
fixed  probability,  a  new  directed  edge  is  created  between  nodes 
i  and  j.  The  probability  of  selecting  an  edge  front  i  to  j 


2 


is  in  proportion  to  i’s  out- weight  and  j’ s  in- weight,  respec¬ 
tively.  Then,  the  out-weight  of  i  and  the  in-weight  of  j  are 
increased  by  1;  hence,  at  every  timestep  the  total  in-weight  (or 
out-weight)  in  the  system  is  exactly  t.  This  general  method 
can  generate  graphs  with  arbitrary  degree  distributions,  but  are 
not  proposed  as  realistic  models  for  the  dynamics  of  Internet 
growth. 

In  contrast  to  the  approaches  above  which  focus  on  reproduc¬ 
ing  statistical  properties,  another  family  of  models  explores 
the  implications  of  optimization-based  algorithms  for  network 
structure.  One  such  model  has  been  suggested  in  [14];  it  as¬ 
sumes  that  nodes  arrive  uniformly  at  random  within  some  Eu¬ 
clidean  space,  and  the  newly  created  edges  attempt  to  balance 
the  distance  d  from  its  new  neighbour  with  the  desire  to  min¬ 
imize  the  average  number  of  hops  h  to  other  nodes.  A  new 
node  i  forms  an  edge  to  j  by  minimizing  the  weighted  sum 
7  -dij  +  hj .  The  resulting  degree  distribution  exhibits  a  power- 
law  tail.  A  second  optimization-based  model  is  described  in 
[4];  this  paper  explores  a  similar  heuristic  but  at  the  ISP  level. 

The  investigation  in  [1 1]  evaluates  the  merits  of  the  B-A  model 
and  its  applicability  to  the  Internet.  The  authors  conclude 
that,  while  the  B-A  family  of  models  do  succeed  in  produc¬ 
ing  power-laws,  the  model  itself  is  not  representative  of  the 
dynamics  that  drive  Internet  evolution:  its  growth  processes 
(preferential  connectivity  )  do  not  match  those  observed  in  the 
Internet.  Also,  they  present  evidence  to  suggest  that  AS-level 
degree  distribution  is  not  a  pure  power-law,  though  still  highly 
variable.  Based  on  these  observations,  together  with  evidence 
in  [31]  which  links  degree  to  size,  [11]  suggests  that  other  (per¬ 
haps  simpler)  mechanisms  decide  the  evolution  of  the  Internet. 

The  work  in  this  paper  shows  that  preferential  connectivity,  or 
indeed  any  dependence  on  degree  in  making  connection  deci¬ 
sions,  is  not  necessary  for  power-law  degree  distributions  to 
emerge.  Furthermore,  our  paper  is  the  first  model  that  mod¬ 
els  highly  variable  degree  distributions  as  well  as  the  size  and 
growth  of  autonomous  systems  themselves. 


Using  these  tables,  we  can  measure  the  number  of  AS  num¬ 
bers  allocated  at  any  point  in  time.  The  result  is  shown  in 
Figure  1,  on  (a)  a  linear  scale  and  (b)  a  semi-log  scale.  Here 
we  assume  that  allocations  provide  a  good  estimate  for  rate 
of  growth  in  total  number  of  ASes  (since  we  are  primarily  in¬ 
terested  in  the  overall  rate  of  growth).  Fitting  a  line  to  this 
logscale  plot  shows  that,  over  the  recent  past,  AS  numbers 
have  indeed  been  allocated  at  an  exponentially  growing  rate. 
We  estimate  the  rate  of  growth  by  the  slope  of  the  linear  re¬ 
gression  fit  to  the  curve,  or  approximately  8.7  x  10-4  (units 
are  ln(ASes)/day). 

The  registries  provide  a  good  record  of  AS  births,  but  it  is  in¬ 
accurate  to  use  their  records  of  allocated  IP  blocks  to  estimate 
growth  of  hosts  in  the  Internet,  because  most  IP  blocks  are 
not  fully  utilized.  The  best  estimate  of  the  number  of  Internet 
hosts  seems  to  be  that  of  the  widely  cited  Internet  Software 
Consortium’s  ‘'Internet  Domain  Survey”  (IDS)  project.  The 
host  count  they  develop  is  based  on  a  reverse  DNS  process; 
details  can  be  found  at  [18], 

Using  the  numbers  published  by  IDS,  we  plot  host  growth  in 
Figure  2  (again,  (a)  is  linear  scale,  and  (b)  is  semi-log  scale). 
Although  the  linear  regression  of  the  whole  curve  fits  reason¬ 
ably  well,  we  note  that  the  slope  of  the  curve  starting  about 
1996  is  noticeably  different  from  the  slope  before  that  point. 
Using  the  linear  fit  shown  in  the  figure,  we  estimate  the  the 
more  conservative  growth  rate  (the  rate  post  1996)  to  be  about 
1.1  x  10-3  (units  are  ln(hosts)/day). 

We  emphasize  that  while  host  count  may  well  underestimate 
the  actual  number  of  hosts  on  the  Internet,  we  are  primarily 
interested  in  estimating  the  rate  of  growth  represented  by  the 
slope  of  the  curve. 

Figures  1  and  2  provide  strong  evidence  of  exponential  growth 
both  in  size  of  the  Internet  and  number  of  ASes.  Next  we  con¬ 
struct  a  simple  evolutionary  model  which  relies  on  the  obser¬ 
vation  that  both  measures  grow  exponentially. 


3.  A  SIMPLE  GROWTH  MODEL 

In  this  section  we  first  motivate  our  model  using  observations 
regarding  the  rates  of  growth  of  ASes  and  hosts  over  time.  We 
then  analyze  the  model  and  explore  its  properties. 


3.1  Exponential  Growth 

We  start  by  assessing  the  growth  of  the  number  of  ASes  in  the 
Internet.  For  this,  we  look  to  a  history  of  routing  number  al¬ 
locations  made  publicly  available  by  the  Internet  registrars1. 
These  agencies  (ARIN,  RIPE,  and  APNIC)  are  collectively 
responsible  for  assigning  all  Internet  routing  numbers.  Each 
publishes  a  table  of  every  AS  number2  and  IP  block  it  allo¬ 
cates,  and  the  date  the  allocation  was  made. 

1  The  strengths  and  drawbacks  of  various  data  sources  for  AS 
tracking  are  discussed  in  [16], 

2  RIPE  does  not  publish  AS  number  allocations,  though  many 
of  these  allocations  have  been  recorded  by  ARIN. 


3.2  Model  Development  and  Analysis 

We  wish  to  construct  a  model  which  builds  on  the  observa¬ 
tions  that  the  number  of  ASes  and  the  number  of  hosts  in  the 
Internet  have  both  grown  exponentially  in  the  recent  past.  Let 
N  (f )  be  the  total  number  of  ASes  and  M  (t )  be  the  total  num¬ 
ber  of  hosts  (or  ‘mass’)  in  the  system.  The  simplest  growth 
model  consistent  with  the  observations  in  the  previous  section 
is  mathematically  described  by  linear  equations 


dM 

dt 


=  pM  +  qN. 


(1) 


Here  q  is  the  rate  of  creation  of  new  ASes  and  p  is  the  rate  of 
creation  of  new  nodes.  When  a  new  AS  is  created,  the  host  is 
given  that  new  label,  explaining  the  qN  term  in  the  left  equa¬ 
tion  in  (1).  (We  assume  that  there  is  no  merging  of  ASes; 
moreover,  we  assume  that  links  do  not  affect  growth  processes, 
and  that  hosts  and  links  never  disappear.  For  a  model  that  in- 
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Figure  1:  Growth  in  the  number  of  Autonomous  Systems. 
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Figure  2:  Growth  in  the  number  of  Internet  Hosts 


eludes  AS  mergers,  see  [16].)  Solving  for  IV  and  M  gives 

N{t)  =  N{  0)eqt,  (2) 

M(t )  =  A  ept  +  BN(t),  (3) 


with  A ,  B  being  simple  functions  of  the  initial  data,  and  the 
parameters  p  and  q.  (At  the  special  point  p  =  q  the  coeffi¬ 
cients  diverge  (A  =  B  =  oo),  reflecting  that  the  exact  solution 
is  actually  a  linear  combination  of  ept  and  t  ept .)  Thus  the  av¬ 
erage  AS  size  (s)  =  M(t)/N(t)  could  exhibit  the  following 
asymptotic  behaviors: 

{finite  whenp<g, 

In  N  when  p  =  q,  (4) 

jy(p-q)/q  whenp  >  q. 


In  [16]  we  show  that  the  average  AS  size  grows  over  time  (and 
with  N),  in  agreement  with  measurements  showing  thatp  >  q. 

Let  Na(t)  be  the  number  of  ASes  with  s  nodes.  This  size 
distribution  satisfies  the  rate  equation3 

dN 

p[(s  -  1)NS-!  -  sNs]  +  qN8s,i-  (5) 

We  already  know  N(t)  =  IV ( 0)  eqt .  Solving  Eqs.  (5)  recur¬ 
sively  and  expressing  in  terms  of  N  rather  than  t  yields 

S 

Na  =naN  +  Y ]  CsjN~jp/q.  (6) 

3  =  1 


The  coefficients  Caj  depend  on  initial  conditions  while  na  are 
universal.  Asymptotically,  only  the  linear  term  naN  matters. 
To  determine  this  dominant  contribution,  we  insert  Na  ( t )  = 
naN (t)  into  Eq.  (5).  We  arrive  at  the  recursion  relation 


na  =  (s  -  l)ns— i  (7) 

for  s  >  2,  while  for  s  =  1  we  have  n i  =  q/{q  +  p).  The 
solution  to  recursion  (7)  reads 


q  r(*)r(2  +  f) 

g+P  T  (s  +  !  +  s) 


(8) 


Asymptotically,  the  ratio  of  gamma  functions  simplifies  to  the 
power  law, 

na~Cs-a,  (9) 

with  a  =  1  +  q/p  and  C  =  F  ^2  +  ^ .  That  is,  the 
model  yields  an  AS  size  distribution  exhibiting  a  power-law 
tail  with  exponent  —(1  +  q/p). 

3  In  the  large  time  limit,  the  random  variables  Ns(t)  become 
highly  localized  around  corresponding  average  values. 


4.  AS  DEGREE  FORMATION 

The  previous  section  showed  that  a  power-law  size  distribution 
emerges  in  the  presence  of  exponential  growth  of  ASes  and 
hosts.  In  this  section  we  extend  this  idea  to  incorporate  AS 
degree. 

The  key  assumption  we  make  is  that  as  an  AS  grows,  it  will 
establish  links  with  other  ASes.  We  show  that  if  link  forma¬ 
tion  occurs  in  rough  proportion  to  an  AS's  growth,  then  the  AS 
degree  distribution  will  show  high  variability.  More  precisely, 
if  at  each  time  step  a  new  node  is  added  to  an  AS  it  forms 
an  inter-AS  link  to  some  other  randomly  chosen  AS  with  a 
fixed  probability,  then  AS  degree  distribution  will  show  high 
variability.  Furthermore,  this  need  only  be  in  “rough  propor¬ 
tion;”  for  example,  the  result  still  holds  if  connection  proba¬ 
bility  varies  with  the  log  of  the  AS  size. 

Any  such  link  formation  process  is  simple  since  it  only  de¬ 
pends  on  growth,  it  is  flexible  since  there  are  no  influencing 
agents  other  than  size,  and  no  global  knowledge  of  other  AS 
degrees  is  required  to  make  link  formation  decisions. 

The  resulting  process  is  detailed  in  the  algorithm  below.  Recall 
the  notation  from  Section  3.2  where  t  is  time  and  N(t)  is  the 
number  of  ASes  in  the  system.  Let  Mi(t)  be  the  number  of 
hosts  in  AS  i,  and  t;  be  the  time  AS  i  is  introduced  into  the 
system.  At  each  timestep  t  two  kinds  of  events  occur:  some 
new  ASes  are  born,  and  existing  ASes  grow.  Starting  at  t  =  1: 


i.  Calculate  the  total  number  of  ASes  according  to  N  (t)  = 
eqt. 

ii.  Introduce  [AT(t)J  —  [N(t  —  1)J  new  ASes  with  a  size 
of  1  and  out-degree  of  1,  where  the  neighboring  AS  is 
chosen  uniformly  at  random. 

iii.  Calculate  the  number  of  total  hosts  within  AS  i  accord¬ 
ing  to  Mi(t)  =  ep(t_4i). 

iv.  For  each  AS  i,  insert  [Mi  (t)J  —  [Mi  (t  —  1)J  new  hosts. 
Each  new  host  creates  an  inter-AS  edge  with  probability 
x,  and  if  an  edge  is  created,  then  invoke  a  select  oper¬ 
ation  to  determine  to  whom  the  new  AS-to-AS  link  is 
created. 


The  select  operation  is  left  unspecified  to  emphasize  the  flex¬ 
ibility  of  the  link  formation  process  and  its  dependence  only 
on  the  AS  size.  We  consider  only  the  simplest  selection  oper¬ 
ation,  where  a  target  AS  is  chosen  uniformly  at  random. 

Even  though  this  is  a  random  connection  process,  ASes  that 
are  larger  in  size  will  also  have  higher  degree.  Thus,  the  de¬ 
gree  distribution  that  results  should  be  highly  variable.  We 
show  in  the  following  sections  that  a  highly  variable  degree 
distribution  does  result,  and  that  this  distribution  fits  well  when 
compared  against  distributions  observed  in  the  Internet. 
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5.  VALIDATION 

We  validate  our  analysis  and  simulation  results  against  empir¬ 
ical  degree  distributions  in  the  following  sections. 


5.1  Empirical  Data  Sources 


log(degree) 

Figure  3:  Degree  Distributions  Inferred  from  4  Sources. 

There  are  a  number  of  sources  from  which  we  can  draw  AS- 
level  degree  distribution.  We  infer  empirical  degree  distribu¬ 
tion  through  two  distinct  methods,  applied  to  three  different 
sources. 

The  first  method  is  to  infer  AS  degrees  from  BGP  tables.  For 
this  purpose  we  use  BGP  tables  from  the  Route  Views  project 
[28]  collected  in  April  2001  and  February  2002.  An  entry  in 
a  BGP  table  consists  of  an  IP  block  represented  by  its  prefix, 
followed  by  a  sequence  of  ASes  (an  AS  path)  that  must  be  tra¬ 
versed  to  reach  an  IP  address  within  that  range.  We  can  infer 
an  adjacency  in  the  AS-level  graph  for  a  pair  of  ASes  when¬ 
ever  they  appear  in  succession  within  any  path.  While  this 
inference  method  typically  avoids  false  positives  (adjacencies 
which  are  not  actually  present,  but  appear  to  be  present),  it 
suffers  from  false  negatives,  since  not  all  AS  adjacencies  are 
advertised  across  BGP  [11], 

A  second  method  for  determining  AS  degrees  is  to  annotate 
a  router-level  map  with  each  router’s  associated  autonomous 
system.  Nodes  in  the  router-level  graph  are  labeled  using  IP 
addresses.  In  the  overlay  produced  by  annotating  the  router- 
level  graph,  each  node  is  further  labeled  with  its  assocated  AS. 
The  approach  is  detailed  in  [10];  we  summarize  the  approach 
here.  An  IP  is  associated  with  an  autonomous  system  by  per¬ 
forming  a  lookup  in  BGP  tables.  First,  find  the  longest  match¬ 
ing  prefix  of  an  IP  address  within  the  BGP  table;  the  last  entry 
in  the  path  vector  is  the  number  of  the  AS  which  owns  that  IP 
address.  A  complete  inspection  of  every  edge  in  the  annotated 
router-level  graph  reveals  an  inter- AS  edge  wherever  any  pair 
of  nodes  are  labeled  with  distinct  AS  numbers. 

This  method  has  numerous  advantages  over  AS  maps  inferred 
from  BGP  tables  directly.  It  provides  an  AS  map  at  a  finer 


granularity;  aggregated  ASes  are  revealed,  as  are  multiple  links 
between  ASes.  However,  this  method  suffers  from  the  follow¬ 
ing  drawback.  Any  single  BGP  table  is  potentially  incomplete 
and  can  be  limited  by  path  hiding  from  parent  ASes  (in  order 
to  reduce  message  and  table  sizes).  Sets  of  BGP  tables  are 
used  to  reduce  the  magnitude  of  this  problem,  with  the  belief 
that  more  BGP  tables  reveal  more  information.  However,  no 
AS  can  observe  the  existence  of  another  AS  which  is  hidden 
by  its  parents. 

We  draw  on  router-level  maps  gathered  from  the  Mercator 
project  [17]  in  August  2001.  and  another  provided  by  the  Skit¬ 
ter  project  [30]  gathered  in  January  2002.  Statistics,  dates,  and 
sources  of  all  four  datasets  are  summarized  in  Table  1 . 


Source 

ASes 

Edges 

Date 

Method 

Route  Views 

10854 

47847 

04/01 

BGP  Adjacenies 

Route  Views 

12875 

57385 

02/02 

BGP  Adjacenies 

Mercator 

3478 

13590 

08/01 

AS  Overlay 

Skitter 

9206 

38334 

01/02 

AS  Overlay 

Table  1:  Summary  of  Data  Sources 


The  degree  distributions  plotted  in  Figure  3  show  that  all  meth¬ 
ods  and  sources  yield  similar  results.  For  subsequent  compar¬ 
isons,  we  use  the  distribution  drawn  from  the  autonomous  sys¬ 
tem  overlay  constructed  from  the  Skitter  dataset  collected  in 
January  2002  as  a  baseline  for  comparison  against  simulation 
results. 


5.2  Constant  Connectivity  Models 

Section  3.2  shows  that  the  size  distribution  that  results  from 
our  model  has  a  power-law  tail.  However,  since  the  growth 
model  does  not  directly  describe  degree,  we  turn  to  our  simu¬ 
lation  to  determine  the  influence  of  size  and  growth  on  degree. 

The  simulation  is  executed  using  the  algorithm  in  Section  4 
using  rates  p  =  1.1  x  10-3  and  q  =  8.7  x  10-4  estimated 
in  Section  3.  The  degree  distribution  predicted  by  our  model 
is  plotted  against  observed  degree  distributions  in  Figure  4. 
We  found  empirically  that  using  fixed  connection  probability 
x  =  0.10  results  in  vertices  of  our  simulated  graphs  having 
a  roughly  commensurate  average  degree  to  that  of  the  Skitter 
dataset.  Where  the  discrepancy  does  occur,  the  general  ten¬ 
dency  is  for  our  model  to  underestimate  the  degree  of  small  to 
medium  sized  ASes,  while  overestimating  the  degree  of  larger 
ASes. 

Figure  4  shows  that  the  predicted  degree  distribution  is  re¬ 
markably  similar  to  that  of  the  Skitter  dataset.  Discrepancies 
can  potentially  be  removed  by  refining  the  decision  processes 
used  to  form  AS  to  AS  connections  in  the  model.  In  the  fol¬ 
lowing  section,  we  explore  a  refined  model  that  accounts  for 
the  size  of  the  AS  when  determining  the  relationship  between 
growth  and  link  formation. 
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Figure  4:  Predicted  Degree  Distribution  where  x  =  0.10 


5.3  Size-Based  Connectivity  Models 

The  relationship  between  predicted  and  empirical  distributions 
shown  in  Figure  4  suggest  that  there  is  room  for  other  practical 
influences  on  inter- AS  link  formation.  Here  we  discuss  an  ap¬ 
proach  that  takes  into  account  the  actual  size  of  the  AS  when 
choosing  to  create  new  links. 

We  presuppose  the  following  notion:  as  an  AS  grows,  the  ratio 
of  its  degree  to  its  size  will  shrink,  and  so  a  constant  proba¬ 
bility  when  deciding  to  create  new  links  may  not  best  relate 
degree  to  size.  Intuitively,  the  ratio  between  the  degree  of  an 
AS  and  its  size  is  analogous  to  surface-to-volume  ratio.  In 
graph-theoretic  terms,  this  ratio  is  often  referred  to  as  the  con¬ 
ductance  of  a  subgraph.  Thus,  we  define  the  conductance  of 
an  autonomous  system  i  with  size  Mi  and  out-degree  di  to  be 

di 

~M~‘ 

Observations  of  conductance  are  estimated  from  Mercator  and 
Skitter  datasets  discussed  in  Section  5.1,  and  shown  in  Ta¬ 
ble  5.3.  This  table  shows  that  as  an  autonomous  system  grows, 
the  average  conductance  shrinks.  While  the  actual  conduc¬ 
tance  of  ASes  of  a  given  size  varies  considerably,  this  trend 
holds  on  average.  Note  that  ASes  of  size  1  are  excluded  from 
the  smallest  range  since  an  AS  of  size  1  must  have  conduc¬ 
tance  of  at  least  1,  and  so  may  bias  observations.  Also,  aver¬ 
age  conductance  in  the  largest  ASes  appear  to  break  this  trend. 
We  believe  that  this  may  be  an  artifact  of  noise  from  a  small 
number  of  data  points. 

We  believe  that  this  decrease  in  conductance  is  natural,  driven 
by  the  decreasing  necessity  to  add  inter-AS  links  as  an  AS 
grows.  For  example,  as  previously  mentioned,  an  AS  of  size 
1  must  have  a  minimum  degree  of  1  (otherwise  it  is  not  con¬ 
nected  to  other  ASes,  and  hence  cannot  be  a  part  of  the  AS- 
level  map).  We  speculate  that  it  is  more  often  the  case  that 
hosts  are  added  to  a  closed  network  to  increase  the  capacity 
and  range  of  the  network  itself,  rather  than  to  connect  to  other 
ASes,  and  so  a  connection  probability  that  decreases  as  an  AS 


Size  Range 

Data  P 
Mercator 

aints 

Skitter 

Average  C 
Mercator 

onductance 

Skitter 

2-10 

1404 

4254 

0.492 

0.866 

11  -  100 

1429 

3502 

0.242 

0.596 

101  -  1000 

359 

1050 

0.134 

0.313 

1001  -  10000 

38 

131 

0.108 

0.213 

10001  -  100000 

1 

10 

0.20 

0.249 

Table  2:  Conductance  of  ASes 


grows  is  reasonable. 

The  ratios  and  ranges  in  Table  5.3  show  diminishing  conduc¬ 
tance  as  AS  size  increases.  To  better  fit  the  data  observed  in 
Table  5.3,  we  applied  a  logarithmic  correction  factor  to  imple¬ 
ment  a  “diminishing  probability”  function,  L.  This  function 
takes  the  size  of  the  autonomous  system  Mi,  and  a  fixed  prob¬ 
ability  x  as  parameters,  and  returns  a  probability  value: 


L(x,  Mi) 


l°gl0  (Mi) 


when  Mi  <  10, 
otherwise. 


(10) 


As  before,  we  use  the  simple  select  operation  which  returns  a 
neighboring  AS  chosen  uniformly  at  random. 


log(degree) 

Figure  5:  Diminishing  Probability  where  x  =  0.20. 

The  distribution  that  results  when  applying  the  diminishing 
probability  function  is  plotted  against  Skitter  data  in  Figure  5, 
using  x  =  0.20,  the  value  providing  the  best  fit.  The  two 
curves  are  nearly  identical,  sharing  a  similar  slope,  and  are 
virtually  indistinguishable  throughout  the  entire  body  of  the 
distribution. 


6.  CONCLUSIONS 

In  this  paper  we  have  explored  a  model  for  how  highly  variable 
degree  distributions  may  arise  in  the  AS  graph.  It  is  instructive 
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to  compare  this  model  with  the  B-A  model. 

Like  the  B-A  model,  we  assume  that  high  variability  has  arisen 
via  a  “rich  get  richer”  phenomenon  resulting  from  an  exponen¬ 
tial  growth  process.  However  the  B-A  model  assumes  prefer¬ 
ential  connectivity,  meaning  that  new  nodes  probabilistically 
prefer  to  connect  to  well-connected  existing  nodes.  Besides 
requiring  that  each  AS  be  aware  of  the  degree  of  each  other 
AS  (a  strong  assumption  of  global  knowledge),  the  B-A  model 
strongly  constrains  the  resulting  connection  pattern.  This  is 
restrictive;  as  discussed  in  [26],  many  graph  realizations  are 
consistent  with  a  given  degree  sequence,  and  different  realiza¬ 
tions  may  have  very  different  properties.  In  fact,  [25]  shows 
that  the  AS  graph  exhibits  a  high  degree  of  clustering,  an  ef¬ 
fect  that  is  not  captured  by  the  particular  connection  pattern 
created  by  the  B-A  model. 

In  contrast,  the  assumption  in  our  model  is  that  AS  sizes  are 
the  underlying  cause  of  high  variability,  and  that  a  large  AS 
will  naturally  tend  to  have  a  large  degree.  From  this  stand¬ 
point,  our  model  allows  for  a  much  wider  range  of  connection 
patterns  than  the  B-A  model,  since  the  degree  of  an  AS  grows 
as  a  function  of  its  size,  but  the  choice  of  which  AS  to  con¬ 
nect  to  can  be  specified  independently,  as  a  separate  selection 
operation.  In  this  paper  we  have  explored  the  selection  opera¬ 
tion  in  which  growing  ASes  choose  peering  partners  uniformly 
at  random;  however  we  expect  that  any  choice  of  peering  part¬ 
ners  that  is  made  without  regard  to  degree  (and  including  those 
that  exhibit  a  high  degree  of  clustering)  will  likely  show  char¬ 
acteristic  high  variability. 

Our  results  demonstrate  that  a  simple  and  natural  model  incor¬ 
porating  exponential  growth  alone  is  sufficient  to  drive  both  a 
highly  variable  AS  size  distribution  and  a  highly  variable  AS 
degree  distribution.  We  motivated  this  model  with  datasets 
that  demonstrate  exponential  growth  both  in  the  number  of 
hosts  and  the  number  of  ASes,  and  validated  the  model  by 
comparing  the  degree  distribution  our  model  predicts  against 
observed  degree  distributions  drawn  from  BGP  tables  and  AS 
overlay  maps.  We  also  provide  an  analysis  of  the  power-law 
tail  of  the  AS  size  distribution  that  results  when  our  methods 
are  employed. 

We  have  integrated  this  model  into  the  publicly  available 
BRITE  [6,  23]  topology  generation  framework.  In  future 
work,  we  intend  to  investigate  selection  operations  that  in¬ 
corporate  real-world  considerations  such  as  locality,  clustering 
and  performance  optimization,  to  provide  an  even  more  real¬ 
istic  AS  growth  model.  As  part  of  this  effort,  we  are  mining 
AS  time-series  data  extracted  from  BGP  logs  to  better  under¬ 
stand  the  underlying  nature  of  AS  growth,  interconnection  and 
merging  over  time  [16]. 
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